Andys Binary Folding Editor

Introduction

The name "Andys Binary Folding Editor" is currently a lie. At the present time, this program only allows structured browsing, no actual editing per-se.

This program is designed to take in a set of binary files, and with the aid of an initialisation file, decode and display the structures within them. BE is particularly suited to displaying non-variable length structures within the files.

This makes examination of known file types easy, and allows rapid and reliable navigation of memory dumps.

Command line arguments

  usage: be [-w width] [-h height]
            [-i inifile] {-I incpath} {-D symbol}
            [-d defn] [-a addr]
            [-y symfile] {binfile[@addr]}
  flags: -w width      screen width
         -h height     screen height
         -i inifile    override default initialisation file
         -I incpath    append include path(s) for use by inifile
         -D symbol     pre-$define symbol(s) for use by inifile
         -d defn       initial definition to use (default: main)
         -a addr       initial address to use (default: 0)
         -y symfile    input symbol table file
         binfile@addr  binary file(s) (with optional address, default: 0)

The -w and -h arguments can be used to try to override the current screen size. This doesn't work on UNIX, but does on OS/2.

The -i flag overrides the default initialisation file.

The -I flag affects the operation of the include command in the initialisation file.

The -D flag allows the definition of symbols which may be accessed via the $ifdef and similar directives in the initialisation file.

The initial structure definition and address to decode may be overridden with the -d and -a flags. Normally BE starts by looking up the definition of a 'main' structure, and decoding the data at address 0 as such.

A symbol table may be specified using the -y flag. Each line of the symbol table is of the form :-

  symbolname    472484aa

Note that the address is in hex, and not 0x preceeded. This conveniently matches the symbol table layout generated by the ARM linker.

Multiple input binary files can be specified, and they should be loaded at non-overlapping address ranges.

Typical invokations of BE might be :-

  be -y gizmo.sym gizmo.rom gizmo.ram@0x8000

  be picture.bmp

The initialisation file

One of the first thing BE does is to find and load the initialisation file, and this tells BE the layout of various file formats and the structures within them.

Under OS/2 or Windows, BE finds the initialisation file by searching along the path for an .INI file with the same name. Under UNIX, BE looks for ~/.berc, (or ~/.xxrc if the be executable is renamed to xx). BE can be made to look elsewhere using the -i command line option.

This initialisation file may contain C or C++ style comments.

Also, $define, $ifdef, $ifndef, $else, $endif and $error are supported, as a form of a pre-processing/conditional processing step. The -D command line option may be used to pre-$define such conditional processing symbols.

If BE is running on OS/2, then OS2 is pre-$defined. If running on Windows NT or Windows 95, then WIN32 is pre-$defined. If running on a type of UNIX, then UNIX is pre-$defined. If running specifically on AIX, then AIX is pre-$defined. Either BE or LE will be pre-$defined, depending upon whether BE is running on a big-endian or little-endian machine. These $defines allow you to write initialisation files with sensible defaults, relevant for the current environment.

An include directive is supported, and included files will be searched for by looking in the current directory, then along an internal include path, and finally along the PATH environment variable. The internal include path is usually empty, but may be appended to by the use of the -I command line option.

The initialisation file contains commands to set the default data display attributes, structure definitions, and include statements.

As BE processes the initialisation file, it generates warnings (such as undefined symbol table symbol), and error messages into an internal buffer. If there are no errors, then this buffer is discarded. If there are errors, then all the warnings and errors are listed, and BE aborts.

Numbers

Wherever the initialisation file calls for a number, the following variants may be used :-

number: The number can be signed, specified in binary (eg: 0b1101), octal (eg: 0o15), decimal (eg: 13) or hex (eg: 0x0d).
addr "symbolinthesymboltable": if a symbol table is loaded, and the symbol can be found then the result is the numeric value of the symbol. Otherwise a warning is generated, and the result is the number 0xffffffff.
sizeof DEFN: this gives the size in bytes of the earlier defined structure DEFN. If DEFN isn't already defined, then an error results.

Expressions may be constructed by use of brackets and also the following operators, with usual C language meanings, listed highest priority first :-

Unary operators: +, -, ~, !
Multiplicative operators: /, *, %, &
Additive operators: +, -, |, ^

  eg: addr "tablebase" + 4 * sizeof RGB

Such numeric expressions can be used when BE prompts for a number.

Commands to set the default data display attributes

When the program starts parsing the initialisation file, the default data display attributes are le unsigned hex nomul abs nonull nocode nolj noseg.

To change this default setting, just include one or more of the following keywords in the file :-

be - read multibyte values from memory in a big-endian fashion.
le - read multibyte values from memory in a little-endian fashion.
signed - when fetching numeric values sign extend them, and when displaying numerically show '+signedvalue' or '-signedvalue'.
unsigned - when fetching numeric values zero extend them, and when displaying numerically show 'unsignedvalue'.
asc - set display mode to ASCII.
ebc - set display mode to EBCDIC.
bin - set display mode to binary.
oct - set display mode to octal.
dec - set display mode to decimal.
hex - set display mode to hex.
sym - set display mode to symbolic. ie: look up the value in the symbol table, and if found, display symbol+hexoffset, else display value in hex.
null - allow following of 0 pointers.
nonull - disallow following of 0 pointers.
seg - cope with 16:16 segmented pointers.
noseg - pointers are not segmented.
mul - pointer values should be multiplied by the size of the data type being pointed to.
nomul - pointer values are given in regular byte addresses.
abs - pointer values are absolute.
rel - pointer values are to be considered relative to their own addresses.
code - specify that numeric value is actually a code address.
nocode - specify that numeric value is not a code address.
lj - perform ARM specific long-jump interpretation of code addresses.
nolj - don't do long-jump interpretation.

Map definitions

These define a mapping between symbolic names and numeric values. A typical mapping definition in the initialisation file might be :-

  map compression_type
    {
    "uncompressed" 1
    "huffman"      2
    "lzw"          3
    }

If the numeric value on display matches the value given, then it can be converted to the textual description.

Bitfields may be acheived in the following fashion :-

  map pending_events
    {
    "reconfiguration" 0x0001 : 0x0001
    "flush_cache"     0x0002 : 0x0002
    "restart_io"      0x0004 : 0x0004
    }

The : symbol introduces an additional mask. The number to string conversion algorithm inside BE works like this :-

  for each maplet in the map
    if ( value & maplet.mask ) == maplet.value then
      display the maplet.name
  if some unexplained bits left over then
      display the remaining value in hex

So, it is possible to have multiple field decodes from a single value :-

  map twobitfields
    {
    "green" 0x0001 : 0x000f
    "blue"  0x0002 : 0x000f
    "red"   0x0003 : 0x000f
    "small" 0x0100 : 0x0f00
    "large" 0x0200 : 0x0f00
    }

The value 0x0243 would be converted to red|large|0x40.

Structure definitions

Structures are a list of at OFFSET clauses and field definitions. When the structure definition is processed, then the current-offset is initialised to 0.

An at OFFSET clause moves the current-offset to the specified numeric value.

A field definition defines a field which lives at the current-offset into the structure. After definition of the field, the current-offset is moved to the end of the field, so that the next field will immediately follow it (unless another at OFFSET clause is used).

The size of the structure is the largest value that the current-offset ever attains. This is the value returned whenever sizeof DEFN is used as a number.

The at OFFSET clause allows the same areas of a structure to be displayed in more than one way, thus allowing the implementation of unions.

Duplicate definitions of the same named structure are not allowed.

A structure definition may have zero or more fields and/or at OFFSET clauses.

Field definitions

Here are some examples of field definitions :-

  n8 asc "initial"
  n8 buf 20 "surname"
  n16 be unsigned dec "age"
  3 pet "pet names"
  3 n16 be unsigned dec "pet costs"
  2 n32 le unsigned hex ptr person "2 pointers to parents"
  2 n32 ptr person null "2 pointers, null legal"
  person "a person"
  n32 sym code "__main"
  1024 n32 unsigned dec "memory as 32 bit words"
  9 n16 map errorcodes "results"

Each example is of the form :-

  opt-count type opt-attrs name

The field describes count data items of the specified type, count is restricted to being >= 1, and if it is > 1, then the field is initially displayed by just showing its type (eg: 10 n32 le unsigned hex "numbers"). When you select the field, you are presented with an element list, with count lines, from which you can select the element you are interested in.

The type of the data is one of n8, n16, n24, n32, buf N or DEFN, where DEFN is the name of a previously defined structure. This type may be considered to be the way in which BE is told the size of the data item concerned. n8, n16, n24 and n32 mean 8, 16, 24 or 32 bit numeric data item. buf N means a buffer of N bytes.

The field has the default data display attributes, unless data display attribute keywords (as defined above) are included in the field definition.

In addition to the data display attribute keywords given above is the map MAP attribute which means display the numeric field by looking up a textual equivelent of the numeric value using the mapping which must have previously been defined.

The ptr DEFN attribute says that the numeric value is in fact a pointer to a structure of type DEFN. DEFN need not be defined yet in the initialisation file. The mul/nomul attribute described above specifies whether to multiply the pointer value by the size of the data item being pointed to. The null/nonull attribute described above specifies whether this pointer may be followed if the numeric value is 0. The keyword add BASE may be used. Also, the rel/abs attribute described above specifies whether to add the address of the pointer itself to the numeric value. By using combinations of the pointer keywords, various effects may be acheived :-

n32 ptr DEFN abs: fetch pointer value, and decode DEFN at that address. This case is very common for file format decoding and memory dumps.
n32 ptr DEFN add 0x40000 abs: fetch pointer value, add 0x40000, and decode DEFN at that address. This case can be used to handle multiple memory space problems.
n32 ptr DEFN mul add addr "table" abs: fetch pointer value, multiply by the size of a DEFN, add the address of the table (as determined from the symbol table), and decode the DEFN at that address. This case is typical for when the pointer is in fact a table index.
n32 ptr DEFN rel: fetch pointer value, add address of the pointer itself, and decode the DEFN at that address. When a file consists of a list of variable length structures, where the first field is the size of the structure, this provides a handy way to skip past it to the next.
n32 ptr DEFN add 8 rel: fetch pointer value, add address of the pointer itself, add the numeric value 8 (this can be negative), and decode the DEFN at that address. This case is common for when one structure includes a field which identifies an amount of data to skip before the next structure is seen.
n32 le ptr DEFN abs seg: fetch pointer value (explicitly in little endian order), mangle pointer to account for 16:16 segmented mode, and decide the DEFN at that address.

The procedure for following pointers is :-

fetch pointers numeric value
if nonull and pointer is 0, then don't follow the pointer.
if mul, then multiply the pointer value by the size of the item being pointed to.
if add BASE, then add BASE to the pointer value.
if rel, then add the address of the pointer itself.
if seg, then mangle pointer address to account for the 16:16 segmented mode of x86 processors.
decode and display data item at resultant address.

The seg keyword works by taking the top 16 bits of the pointer value as the segment, the bottom as the offset, and producing a new pointer value which is segment*16+offset. This feature may be of use for decoding large memory model program dumps which have been running on x86 processors running in real mode, or a 16:16 protected mode with a linear selector mapping. Anyone with a sensible file format to decode, or a dump taken from the memory space of a processor of a sensible architecture, can ignore this feature.

The keyword open may be given and this has the effect of increasing the level of detail that is initially displayed. See the description of the level of detail of display feature later in this document. This feature has its problems (bugs), but can be used to ensure that small arrays and short structures are displayed in full without the user having to manually increase the level of detail by hand.

Finally the name of the field must be given.

Include directives

The initialisation file can contain the following, as long as it is outside of any other definition :-

  include "anotherfile.ini"

A sample initialisation file

Here is a snippet from a real initialisation file :-

le unsigned hex abs // set defaults, just to be sure
lj // allow ARM specific symbolic lookup of code addresses

map DE_
  {
  "DP_Pending" -1
  "DS_Success"  0
  "DE_Failure"  1
  }

def DPB
  {
  n32 ptr DPB       "DPB_Next   "
  n32 sym code      "DPB_Address"
  n8 map DC_        "DPB_Number "
  n8                "DPB_Flag2  "
  n8 map SY_        "DPB_Flag   "
  n8 signed map DE_ "DPB_Dsb    "
  n32               "DPB_Safety "
  }

def NOP
  {
  DPB     "NOP_Header"
  n8      "NOP_Spare1"
  n8      "NOP_Spare2"
  n8      "NOP_Spare3"
  n8 dec  "NOP_Period"
  n32 dec "NOP_Value "
  CLK     "NOP_Clock "
  }

def main // the entire memory map
  {
  at addr "noptable"   100 NOP     "noptable  "
  at addr "currentdpb" n32 ptr DPB "currentdpb"
  }

The supplied initialisation file

The supplied initialisation file contains enough definitions to enable you to examine the contents of many image file formats.

These include Windows / OS/2 Bitmaps, Targa files, KIPS files, ZSoft PCX, M-Motion Video, TIFF, ILBM IFF, Compu$erve GIF, RiscOS sprite, IBM PSEG, and OS/2 resource files.

The definitions in the initialisation file are in no way complete, or intended to be a definitive statement of such files contents, but are merely intended to aid in the browsing of the contents of such files.

Limitations of BE make it awkward to decode certain data structures in some files, so the attitude taken is typically 'display as best you can', and where data may be of variable length 'display the first few bytes worth...'.

Using the editor

Although not displayed, the arrow keys, such as Up, Down, PgUp, PgDn, Home and End all work in the obvious ways, traversing the list on display. The Wordstar keys ^E, ^X, ^R, ^C, ^W and ^Z also work.

BE displays the non-obvious keys you may press on the 2nd line of its status area, at the top of the screen.

@X (ie: Alt+X), or q exits the program.

Esc exits the current screen back to the previous level.

f allows you to do a find over the list on display. This only searches as much as the user could see if he were to manually page up and down through the list. The find command is case sensitive. n can be used to repeat the last find.

i allows you to generate a display which only has lines which include a pattern you specify. For example, if you have an array of trace-point events, you can easily generate a list of just trace-points from one module. Similarly, x allows you generate a display which excludes lines which match the pattern. Esc exits back to the original display.

The keys A,O,I toggle the display of addresses, offsets and array indices.

The r key causes a refresh. BE re-fetches all the data on display. The R is a slightly more aggressive form of refresh. If an extension providing data to BE was caching data, this type of refresh causes it to drop its cache.

g/l is displayed if you are allowed to change the memory interpretation mode to big or little endian.

s/u is displayed if you are allowed to change the signed display mode to signed or unsigned.

A subset of the keys a/e/b/o/d/h/y/m may be displayed if you are allowed to change the viewing mode to ASCII, EBCDIC, binary, octal, hex, decimal, symbolic or via a mapping table.

+/- is displayed to indicate that the level of detail of display may be increased or decreased. Level 0 means display the data type only. Level 1 means display the first level of data. Levels 2 and above mean display additional levels of detail.

Increasing the level of display can make BE open up an array, and enumerate the elements. eg: 3 n32 to [123,123,456].

Increasing the level of display can also make BE open up a definition, and display the fields. eg: VAR to {"name",123}.

This is capable of opening up the datastructure pointed to by a pointer, providing the pointer may be fetched and followed.

Some examples :-

  level 0        level 1        level 2             level 3
  -------        -------        -------             -------
  n32            7
  3 n32          3 n32          [8,9,10]
  VAR            VAR            {"a",1}
  2 VAR          2 VAR          [VAR,VAR]           [{"b",2},{"c",3}]
  n16 ptr VAR    22->VAR        22->{"d",4}
  2 n8 ptr VAR   2 n8 ptr VAR   [33->VAR,44->VAR]   [33->{"e",5},44->"f",6}]

Enter is displayed if you can press enter to either show the contents of the sub-definition, or to follow a pointer and show the definition there. The Esc key brings you back to where you are now.

Pressing @ will cause BE to prompt for a structure definition name, and then an address. It will then decode the memory at the given address as if it were of the specified structure type.

Extensions

The binary file arguments to BE are normally of the form :-

  filename[@address]

This tells BE to load the file and whenever data at a memory address from address to address+filelength is accessed, to supply the data from the file.

However, it is possible to supply binary file arguments of the form :-

  extension!args[@address]

Under OS/2, BE will ensure that BEextension.DLL is loaded. This DLL should be on the LIBPATH and should contain certain entrypoints which will be used by BE. BE then passes the args and address to the extension DLL, who does something of its own chosing with them. The extension DLL can then supply data to BE on request.

Under Windows, provision for extension DLLs is also exists. The DLL is located according to the algorithm used by the Win32 LoadLibrary API.

One use of this might be the provision of an extension for handling files too massive to load into memory all at once. The extension could open a file handle and read bytes demanded by BE upon request. This extension could be provided in BEBIGFIL.DLL, and the user could type :-

  be bigfil!verybigfile.dat

Another use might be in live-debug of adapter cards. The extension would provide data bytes from the memory space of the adapter. args could be used to identify the slot the adapter is in.

Yet another use, might be providing BE with access to physical or virtual or process specific linear address spaces, perhaps via the use of a device driver. Shared memory windows might give addressibility of datastructures in other programs.

Also, the surface of a disk or block device could be made accessible via an extension.

Perhaps bytes sent down a communications port could be made to appear as a stream of binary data.

The file bememext.h documents the extension interface. Currently extensions may only be built for the OS/2 or version of BE using the IBM C-Set++ compiler, or the Win32 version of BE using MS Visual C++. I anticipate learning about shared library support on the various different types of UNIX, enabling similar tricks to be performed there.

Installation

BE can be found on the You get a selection of executables, and the one to pick depends upon which operating system you wish to run :-

be_os2.exe: Runs on 32 bit OS/2.
be_win.exe: Runs on Windows NT and Windows 95.
be: Runs on AIX.

Installing BE for OS/2

Copy be_os2.exe to be.exe, somewhere on the path.
Copy be.ini to the same directory as be.exe so it can be found.
Optionally copy be.htm to wherever you keep documentation.
Optionally copy be.ico to the same directory as be.exe. This allows BE to have a cute icon when running in the Workplace shell.
Optionally create a Workplace Shell Program Object(s) that references the BE executable. The working directory should be the directory where be.ini can be found.

Installing BE for Windows NT or Windows 95

Copy be_win.exe to be.exe, somewhere on the path.
Copy be.ini to the same directory as be.exe so it can be found.
Optionally copy be.htm to wherever you keep documentation.

Installing AE for UNIX, ie: AIX

Copy the be executable to somewhere like /usr/bin or ~/bin, or wherever on the path you consider appropriate.
Copy the be.ini to .berc in your home directory, or make a soft link to a common .berc somewhere from your home directory.
Optionally copy be.htm to wherever you keep documentation.

Unfortunately I don't have continual access to all the platforms, so improvements in one version may not yet be reflected into the others.

Copying

Copying of this program is encouraged, as it is fully public domain. Caveat Emptor.

This documentation is written and maintained by the Binary Editor author, Andy Key

ak@nyangau.aladdin.co.uk